Adaptive Chirplet Transform-Based Machine 
Learning for P300 Brainwave Classification 


1 st Aman Bhargava 
Faculty of Engineering Science 
University of Toronto 
Toronto, Ontario 
aman.bhargava@mail.utoronto.ca 


2 nd Steve Mann 

Faculty of Electrical and Computer Engineering 
Univeristy of Toronto 
MannLab Canada 
Toronto, Ontario 
mann @ eecg. toronto.edu 


Abstract —Within brain-computer interface (BCI) research, 
classification of event related potentials (ERP’s) is of great 
interest as a method of understanding the brain, as well as for 
human-computer communication. In particular, investigation of 
P300 brainwaves is relevant due to its ease of use in the context 
of BCI. Though high-accuracy models have been developed 
for individual subjects under laboratory controlled conditions, 
further work is required to improve the robustness of the models 
especially for more varied conditions. 

Here we propose an adaptive chirplet transform (ACT) al¬ 
gorithm coupled with an artificial neural network for robust 
P300 classification. Comparison of the proposed method with con¬ 
ventional downsampling methods for feature extraction showed 
an 80% and 73% accuracy respectively when a neural network 
model was fit to a collection of 16 subject’s P300 data. Meanwhile, 
accuracies of up to 100% could be achieved (depending on the 
subject) when the model was trained and validated on subject- 
specific datasets with the same averaging techniques. 

This investigation makes it clear that the adaptive chirplet 
transform holds promise for addressing issues of reliability and 
robustness in P300 BCI and other related brain signal processing 
tasks. 

Index Terms —adaptive chirplet transform, ACT, machine 
learning, electroencephalography, event-related potentials, P300, 
chirplet transform, EEG 

I. Introduction 

The P300 event-related potential is a positive deflection 
in electrical potential as measured via electroencephalogram 
(EEG) occurring 300 milliseconds after particular stimulus 
types and cognitive processes [1], It is most strongly observed 
over the parietal lobe. The most common method for eliciting 
the P300 wave is the oddball paradigm where a user is asked 
to scan for an uncommon stimulus among a sequence of 
common stimuli [2]. For example, a user may be asked to 
count the number of occurrences of the infrequent letter ‘O’ 
out of a series of ‘O’s and ‘Q’s flashed on a screen. The P300 
wave appears to be related to cognitive processes involving 
recognition and decision making [3]. 

Due to the ease with which it can be elicited, P300 has been 
a common tool in brain-computer interface (BCI) systems. The 
P300 speller is one such BCI [4]. In this system, a matrix 
of letters is flashed as a user counts the number of flashes 
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for a particular letter they wish to communicate. A computer 
scores the brainwaves associated with the time when each 
letter was flashed in terms of P300 similarity, and the one 
with the greatest similarity is classified as the target letter for 
the period. 

Previous studies on P300 classification have utilized a vari¬ 
ety of methods for reducing the high dimensionality of the raw 
input signals while retaining the relevant P300 information. 
This is of importance as the raw feature vectors that arise 
from the EEG voltage sampling (usually at or above 128Hz 
sampling rate) have incredibly high dimensionality and tend to 
lead to over-fitting in conventional machine learning classifiers 
[5]. Techniques such as discrete wavelet transform (DWT), 
Fourier analysis, correlation thresholding, and downsampling 
have been successfully utilized in the past [6] [7]. That said, 
there is still room for improvement in the signal processing 
of the raw feature vectors given the lack of reliability and 
robustness for P300 BCI outside the context of a laboratory 
[7]. Due to the low SNR, signal averaging is often employed 
to clarify the P300 signal over multiple elicitations. The 
technique can drastically improve classification accuracy, but it 
slows the net rate of information transfer due to the increased 
number of elicitation periods required [8], 

The chirplet transform has yet to be applied to P300 
analysis and classification, making the present investigation 
highly relevant to the future of P300 and BCI research. Here, 
we propose an Adaptive Chirplet Transform algorithm for 
generating sparse representations input waveforms that can be 
used as dimensionality-reduced feature vectors for an artificial 
neural network to more accurately classify ERP’s as either 
P300 or non-P300. We hypothesize that a sparse chirplet-based 
representation of ERP’s can more effectively elucidate relevant 
characteristics of the waveform as compared to conventional 
methods, leading to greater classification accuracy. 

II. Materials and Methods 
A. Chirplet Transform 

A chirp function is a periodic function where the frequency 
varies linearly with some independent variable (e.g. time). 
A chirplet is a windowed chirp function that arises from 
the multiplication of a chirp function by a time localized 
envelope function, much like how a wavelet is a windowed 



wave function [9]. The equation for the Gaussian chirplet 
family used in this paper is: 
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Where t c is the time center, f c is the frequency center, A t 
scales the duration of the chirplet window, and c is the chirp 
rate [10], The second exponential yields a complex valued 
chirp function and the first exponential is a Gaussian envelope. 

The original non-adaptive chirplet transform is computed by 
taking the inner product of an input signal s(t) with the pa¬ 
rameterized chirplet gt c , f c , log A t ,c(t) over the chirp parameter 
space to determine a coefficient distribution fflt c / i og A t ,c [9]: 


a tc j c , tog A t )C = / s(t)g* cJclos(At)c {t)dt = (s\g) (2) 

Where g* is the complex conjugate of g. 

B. Adaptive Chirplet Transform 

While there is much to be gained from analyzing a sig¬ 
nal using the above continuous chirplet transform equations, 
it is often helpful to sparsely express an input signal as 
the linear combination of a relatively small set of chirplets 
with the greatest possible similarity to the initial function. 
There are some situations where a densely-populated chirplet 
space provides important insight, but in other situations, this 
sparse representation is useful. For instance, one can easily 
deduce important properties of an input signal from a sparse 
representation, assuming that the input signal has periodic 
characteristics [10]. For instance, if a signal has a strong 
central frequency that increases with time (‘up-chirping’) with 
a particular center frequency, time, and duration, an optimal 
sparse representation would contain a corresponding chirplet 
with the same frequency, duration, and time. If the represen¬ 
tation were not sparse, the aforementioned component may be 
distributed across multiple weaker chirplet components, thus 
making analysis less straight forward. 

The adaptive chirplet transform was initially proposed by 
Mann and Haykin in 1992 [11] as a method of reducing the 
high-dimensionality output space of the non-adaptive chirplet 
transform. Their methodology utilized logon expectation max¬ 
imization (LEM) for determining a sparse representation of 
an input signal, which was adapted by Cui in 2006 in his 
investigations on visual evoked potentials (VEP) [10]. 

Expressing a signal via chirplets is an NP-hard problem, 
so non-optimal approximation methods are used. We utilise 
the following mathematical tools from [10] to assist in this 
process: 

• I = (t c . f c , c, At) represents the parameters for a given 
chirplet (gj) or the corresponding coefficient (af). 

• P represents the order of a given chirplet approximation 
of an input signal (how many chirplets are used to 
approximate the signal). 

• /p(f) = ]Cn=i a i n 9in(f) i s the Pth order approximation 
of the input signal /(f). 


• R p+1 f(t) = f(t) — f p (t) is the residue of the input signal 
that remains after the Pth order approximation. 

The adaptive chirplet transform algorithm proposed and 
utilized in this investigation is based heavily on the algorithm 
proposed by Cui in The Adaptive Chirplet Transform and 
Visual Evoked Potentials [10], It is a greedy algorithm that 
works as follows: 

1) Construct a chirplet dictionary that has a relevant sam¬ 
pling of the 4-dimensional parameter space. 

2) Instantiate P := 1; R}f(t) := /(f); 

3) Utilize the matching pursuit (MP) algorithm to deduce 
a coarse estimate for the chirplet parameter Ip in the 
chirplet dictionary that maximizes the value of \aj p \. 

4) Utilize the Broyden-Fletcher-Goldfarb-Shanno (BFGS) 
algorithm to further optimize the value of |a/ | until 
convergence (V|a/ | = 0). 

5) Set R p+1 f(t) := R P f{t) - a Ip g Ip (t). 

6) Increment P and repeat steps 3-5 until the value of 
R p f(t) decreases beyond a threshold OR until a prede¬ 
termined value of P is met. 

Unlike in Cui’s manuscript, there is no expectation- 
maximization (EM) loop in the process because the BFGS 
optimizer was able to reliably converge to optimal values of 
ai in step 4 such that V|aj | = 0 in this particular use case of 
P300 signal decomposition. For this reason, EM iterations that 
would attempt to further optimize earlier chirplets in the set 
would yield a null effect on the parameter estimates because 
the optimizer had already converged. 

C. Conventional Feature Extraction 

The alternative feature extraction method used in this inves¬ 
tigation was the downsampling method. This method involves 
the selection of a desired number of dimensions d. The input 
signal is then split into d equally-sized partitions, and the 
average signal value within each partition is calculated and 
packaged into a feature vector. 

D. Machine Learning Methods 

In order to compare the two feature extraction methods, a 
set of machine learning algorithms was trained on each dataset. 
These consisted of a logistic regression classifier, a multilayer 
perceptron (neural network) classifier, and a K-nearest centroid 
algorithm. The neural network algorithm had an input layer 
with the same number of neurons as the dimensionality of the 
given input, one hidden layer with 5 neurons, and one out¬ 
put neuron for classification. Model selection and alternative 
feature extraction methods were largely informed by previous 
work on wavelet-based machine learning classifiers for P300 
ERP’s by Vareka and Mautner in 2013 [6], 

E. Dataset 

The dataset used in this experiment was from the IEEE 
dataport collected by Abibullaev and Zollanvari at Nazarbayev 
University [12], It containes P300 data from 16 participants 
with a standard 10-20 EEG electrode headset. The sampling 
frequency was 128 hertz, and there were 1000-4000 training 



examples for each participant. Roughly 1/6 of the samples 
were target P300 waves. The data was low-pass filtered 
beforehand with a cutoff of 15 hertz. Each training sample was 
exactly 76 samples long (approximately 600 milliseconds). 

F. Experimental Design 

The 16 subjects’ data was combined into a single dataset to 
test the relative effectiveness of the proposed method in a wide 
variety of conditions. The data was normalized such that the 
RMS value of each signal was set to 1. Per standards set out in 
[8], bundles of 6 input target/non-target signals were averaged 
into single training examples. A balanced dataset with 50% 
target and 50% non-target class feature vectors was randomly 
selected. The resulting balanced dataset was split into 70% 
training data and 30% validation data. 

For the adaptive chirplet transform tests, the average target 
signal was calculated from the training data. The 16th order 
adaptive chirplet transform was taken on the signal, and 
features for each individual training sample were extracted 
by projecting the 16 chirplets onto the signal (see Fig. 2). The 
resulting inner products were packaged into feature vectors for 
classification. 

The downsampling based feature extraction method was 
also utilized to generate a separate training and validation 
dataset. Each feature vector had dimensionality 8 as in [6], and 
the three classifiers were trained and validated using the appro¬ 
priate subsets. The final validation accuracies were used as a 
marker of success for each feature extraction method/algorithm 
combination. 

Each combination of algorithm and feature extraction 
method was carried out on each channel of the EEG headset 
individually. The channel from which validation testing (and 
by extension the results) were drawn was selected automati¬ 
cally based on training accuracy. 

III. Results 

The initial averaged event-related potential waveform from 
the dataset yielded expected results. It can be observed in 
Fig 1 that the target waveform average has a clear elevation 
and subsequent depression starting at roughly 200 milliseconds 
while the non-target average has no substantial disturbances. 


Averaged P300 Waveforms 



Fig. 1: Signal averaged wave-forms for target versus non-target 
stimulus. 


When 6-signal averaging was used on the input vectors, 
there was some observable similarity between the overall 
target average and a given target waveform. There was also 
a corresponding lack of correlation between the overall target 
average and a given non-target waveform (Fig 2). 


Target Training Example vs. Average Target Waveform 
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Non-Target Training Example vs. Average Target Waveform 
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Fig. 2: Example target and non-target waveforms superim¬ 
posed on plot of averaged target waveform for the entire data 
set. 

That said, there is clear interference and noise in the 
example ERP signals that must somehow be accounted for 
in order to be certain of the signal’s class. 

Fig 3 shows successive ACT approximations of an input 
signal - in this case, the overall average P300 target waveform. 
The set of chirplets that comprise the approximation were 
then projected onto each input signal in order to extract a 
compressed feature vector that reflects similarity between the 
input signal and the average target feature vector. 


Successive Approximations of Average Target Waveform 
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Fig. 3: Successive adaptive chirplet transform approximations 
of a target signal. 













A representation of the alternative feature extraction method 
(downsampling) is shown below in Fig 4. 


Example Downsampled Waveform 
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Fig. 4: Downsampled vs. full-sampled version of a signal. 
Downsampling was performed via averaging within each of 
the 8 partitions, a value specified in [6] 

After randomly dividing the chirplet and downsampling 
based feature vector datasets into training and validation sets 
(70-30% split), the panel of machine learning algorithms were 
trained and validated on each dataset. The following validation 
results were obtained (see Table I): 

TABLE I: Validation Accuracies for Algorithm-Feature Ex¬ 
traction Combinations (Channel 5). 



Feature Extraction Method 

Chirplet Downsample 

Neural Net 
Logistic Reg 
K-Centroid 

80.2% 73.2% 

64.2% 63.8% 

57.7% 63.0% 


IV. Discussion 

This investigation successfully addressed the question of 
whether the projection of Adaptive Chirplet Transform com¬ 
ponents for feature extraction in P300 classification can out¬ 
perform conventional downsampling methods. Given that the 
neural net algorithm validation accuracies for the Adaptive 
Chirplet Transform method and the downsampling method 
were 80.2% and 73.2% respectively, it is clear that the Adap¬ 
tive Chirplet Transform holds substantial promise for P300 
ERP analysis and classification. 

Like many state-of-the-art classification methods, the adap¬ 
tive chirplet transform/neural network approach was able to 
achieve 90-100% accuracies on certain test subjects. The re¬ 
sults shown in Table I, however, show that the ACT-based ap¬ 
proach generalizes well to a dataset comprised of a multitude 
of test subject’s data. This implies that the feature extraction 
method is able to usefully represent P300 information in 
a robust fashion. That said, more diverse testing would be 
justified in order to confirm this implication. Classification 
of a dataset collected from individuals over multiple testing 
sessions and algorithms that can continuously adjust to new 
data would optimally reflect the proposed method’s ability to 
outperform existing methods in true P300 BCI use-cases. 


The only classifier where the proposed methodology had a 
lower accuracy than the conventional 8-dimensional downsam¬ 
pling approach was with the K-Centroid algorithm. This could 
be because the decision boundary is not represented easily in 
Euclidian space, but since neither feature extraction method 
worked well with the KCT algorithm, this fact is of relatively 
low importance to the overall results. 

While the proposed method has the advantage of computa¬ 
tional efficiency in the classification pipeline, alternative meth¬ 
ods where the adaptive chirplet transform decomposition is 
taken for all input signals could yield even higher performance 
and assist in the further elucidation off the nature of the P300 
ERP. The computational feasibility of this approach, however, 
is dubious. The matching pursuit (MP) step in the ACT 
algorithm is exceedingly computationally expensive, involving 
many multiplications of the input signal with an extraordinarily 
large matrix of chirplets (the chirplet dictionary). Further work 
on the efficiency of the ACT algorithm (including a study 
on the requirements for the chirplet dictionary, the use of 
alternative optimizers for fine-tuning) would facilitate such 
studies as well as the use of the associated techniques in 
practice. 

V. Conclusion 

It was found that the proposed adaptive chirplet transform 
and neural network-based P300 classification method sub¬ 
stantially outperformed the conventional downsampling-based 
methods by a substantial margin (80% to 73%) on a multi¬ 
subject 6-signal averaged dataset. State-of-the-art accuracies 
of up to 100% were achieved with subject-specific models 
with the same signal averaging methods. 

These results made clear the utility of the adaptive chirplet 
transform in classifying and analyzing brainwaves, particularly 
the P300 event related potential. 

Areas for future exploration include alternative adaptive 
chirplet transform-based feature extraction methods. In par¬ 
ticular, the use of the chirplet coefficient and parameter sets 
resulting from the decomposition of each individual input 
signal. Work on the computational efficiency of the adaptive 
chirplet transform beyond the adaptations proposed in this 
paper would greatly expedite the research. Research analyz¬ 
ing electrocorticography signals, fNIRS, tomographic fNIRS, 
and implamantable brain-machine interface signals using the 
adaptive chirplet transform is also of interest. 
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